experiment section
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors derive a new convex relaxation for the noisy seriation problem (a combinatorial ordering problem, where variables must be ordered on a line such that their pairwise similarities decrease with their distance on this line). Specifically, they use the construction in Goemans [1] based on sorting networks, in order to optimize over the convex set of permutation vectors (ie. the permutahedron) instead of the convex hull of permutation matrices (ie. the Birkhoff polytope). The new representation reduces the number of constraints from Theta(n^2) to Theta(nlog^2n) and turns out to be in practice significantly faster to solve some instances of the seriation problem. I think this paper provides a very appealing convex relaxation to the seriation problem, since it enables to solve much larger instances (up to several thousands with a standard interior point solver, against to a few hundreds with previous relaxation in [2]).
We sincerely appreciate the time and the efforts the reviewers invested in reading our paper and providing valuable
We would like to emphasize again the main contribution of our paper. To Reviewer 1: Thanks for the citations and the correction you provided. In our final submission, we will cite [4] We will also correct all the items mentioned in SPECIFIC REMARKS/TYPOS. It also has many other applications such as volume computation and bandit optimization. The preliminary results are attached.
* says " It will be of great help to the improvement of the generalization ability of the
We thank each reviewer for taking the time to thoughtfully comment on our work and we're glad that they recognize NLU tasks, such as teaching autonomous agents to perform tasks by demonstration. R2 wonders: don't these results just show R2 also points out some things that are unclear in the experiments section. It's true the models also perform well on the random split (A), which we left unsaid but will Finally, we thank R2 for pointing out 2 missing links in Figure 1, we will update them accordingly. C can shed light on this. GECA adds a lot of red squares to the training set.
b58f7d184743106a8a66028b7a28937c-AuthorFeedback.pdf
Thank you for a detailed review. In terms of content however, we believe that our contributions have been mischaracterized. Q: "The contributions of the paper are very close from the one of [12]" They use a generic probabilistic bound at the core of their analysis, we use the specific dynamics of SGD/SGLD. Furthermore, our assumptions do no imply any of the path-length assumptions in [18]. Reviewer #6: Thank you for an in-depth read of our paper.
Review for NeurIPS paper: Policy Improvement via Imitation of Multiple Oracles
Weaknesses: Highest priority comments are the P0 comments listed below. P0: - I think you should clarify what you mean by "experts". You are allowing the definition of experts to include sub-optimal policies, but is there an extent to which you are allowing them to be suboptimal? I feel like this needs to be clarified. If they can be any policy, then does this not fall more in the domain of off-policy/batch RL, rather than imitation learning.
Reviews: Visualizing the PHATE of Neural Networks
Update after author response: Taking on faith the results the authors report in their author response (namely ability to identify generalization performance using only the training set, results on CIFAR10 and white noise datasets, and the quantitative evaluation of the task-switching), I would raise my score to a 6 (actually if they did achieve everything they claimed in the author response, I would be inclined to give it a 7, but I'd need to see all the results for that). Originality: I think the originality is fairly high. Although the PHATE algorithm exists in the literature, the Multislice kernel is novel, and the idea of visualizing the learning dynamics of the hidden neurons to ascertain things like catastrophic forgetting or poor generalization is (to my knowledge) novel. Quality: I think the Experiments sections could be substantially improved: (1) For the experiments on continual learning, from looking at Figure 3 it is not obvious to me that Adagrad does better than Rehearsal for the "Domain" learning setting, or that Adagrad outperforms Adam at class learning. Adam apparently does the best at task learning, but again, I wouldn't have guessed from the trajectories.
Reviews: Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
This paper describes a method for integrating visual and textual features within a self-attention-like architecture. Overall I find this to be a good paper presenting an interesting method, with comprehensive experiments demonstrating the capacity of the method to improve on a wide range of models in image captioning as well as VQA.The analysis is informative, and the supplementary materials add further comprehensiveness. My main complaint is that the paper could be clearer about the current state of the art in these tasks and how the paper's contribution relates to that state of the art. The paper apparently presents a new state-of-the-art on the COCO image captioning dataset, by integrating the proposed method with the Transformer model. It doesn't, however, report what happens if the method is integrated with the prior state-of-the-art model SGAE -- was this tried and shown not to yield improvement?
Reviews: Learning Hierarchical Priors in VAEs
This paper discussed how to enhance the existing methods in which designed prior could over regularize the posteriori, so it will try to find a way to learn a complex prior which can learn the latent pattern of data manifold more efficiently. To learn such prior, paper adopted and modified one dual optimization technique and introduced an efficient algorithm on how to update the hierarchical prior and posteriori parameters. The combination of complex priori with the introduced algorithm have learned a posterior which has more informative latent representation and avoids posteriori collapse. In addition, paper introduced a graph search method to interpolate the states and showed how effective algorithm can discover a meaningful posteriori over the experiment section. So we can summarize the contribution of this paper as following - Introduce a hierarchical prior which can avoid over regularization of the posterior while learning latent variables manifold - Adopting and expanding an optimization technique and an algorithm to learn hierarchical prior and hierarchical posterior parameters.